Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion

نویسندگان

چکیده

This article proposes DisCo , an automatic deep learning compilation module for data-parallel distributed training. Unlike most compilers that focus on training or inference a single device, optimizes DNN model over multiple GPU machines. Existing single-device strategies do not work well in training, due mainly to communication inefficiency they incur. generates optimized, joint computation operator and tensor fusion enable highly efficient A GNN-based simulator is built effectively estimate per-iteration time achieved by operator/tensor candidates. backtracking search algorithm driven the simulator, navigating efficiently large strategy space identify good minimize time. We compare with existing DL schemes show it achieves speed-up close ideal, full computation-communication overlap case.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Dendritic Cell Preparation for Fusion with Melanoma Cells

Background: Fusion of dendritic cells (DCs) with melanoma cells could reinforce the antigenicity of tumors as a strategy for the treatment of malignant melanoma. However, the insufficient quantity of DCs and the low fusion efficiency limits the development of such approach. Objective: To define the dosage of the stimulating factors as well as the induction condition for the optimal DCs prepara...

متن کامل

DNN-Train: Benchmarking and Analyzing DNN Training

We aim to build a new benchmark pool for deep neural network training and to analyze how eicient existing frameworks are in performing this training. We will provide our methodology and develop proper proiling tools to perform this analysis.

متن کامل

Scalable distributed DNN training using commodity GPU cloud computing

We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD) training of Deep Neural Networks (DNN). The method solves the well-known communication bottleneck problem that arises for data-parallel SGD because compute nodes frequently need to synchronize a replica of the model. We solve it by purposefully controlling the rate of weight-update per individual weight, whic...

متن کامل

Gmm-free Dnn Training

While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better ma...

متن کامل

Optimizing fusion architectures for limited training data sets

A method is described to improve the performance of sensor fusion algorithms. Data sets available for training fusion algorithms are often smaller than desired, since the sensor suite used for data acquisition is always limited by the slowest, least reliable sensor. In addition, the fusion process expands the dimension of the data, which increases the requirement for training data. By using str...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2022

ISSN: ['1045-9219', '1558-2183', '2161-9883']

DOI: https://doi.org/10.1109/tpds.2022.3201531